R Markdown
install.packages('fivethirtyeightdata', repos = 'https://fivethirtyeightdata.github.io/drat/', type = 'source')
## Installing package into '/stor/home/sm69929/R/x86_64-pc-linux-gnu-library/3.6'
## (as 'lib' is unspecified)
## Warning in install.packages("fivethirtyeightdata", repos = "https://
## fivethirtyeightdata.github.io/drat/", : installation of package
## 'fivethirtyeightdata' had non-zero exit status
library(fivethirtyeight)
## Some larger datasets need to be installed separately, like senators and
## house_district_forecast. To install these, we recommend you install the
## fivethirtyeightdata package by running:
## install.packages('fivethirtyeightdata', repos =
## 'https://fivethirtyeightdata.github.io/drat/', type = 'source')
bad_drivers <- bad_drivers
partisan_lean <- partisan_lean_state
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.1
## ✓ tidyr 1.1.1 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(cluster)
Wrangling
#Filter
partydrivers %>% filter(pvi_party=="D", insurance_premiums<935)
## # A tibble: 11 x 10
## state pvi_party pvi_amount num_drivers perc_speeding perc_alcohol
## <chr> <fct> <dbl> <dbl> <int> <int>
## 1 Cali… D 24 12 35 28
## 2 Colo… D 1 13.6 37 28
## 3 Hawa… D 36 17.5 54 41
## 4 Illi… D 13 12.8 36 34
## 5 Maine D 5 15.1 38 30
## 6 Minn… D 2 9.6 23 29
## 7 New … D 7 18.4 19 27
## 8 Oreg… D 9 12.8 33 26
## 9 Verm… D 24 13.6 30 30
## 10 Virg… D 0 12.7 19 27
## 11 Wash… D 12 10.6 42 33
## # … with 4 more variables: perc_not_distracted <int>, perc_no_previous <int>,
## # insurance_premiums <dbl>, losses <dbl>
#Select
partydrivers %>% select(state, contains("perc"))
## # A tibble: 50 x 5
## state perc_speeding perc_alcohol perc_not_distracted perc_no_previous
## <chr> <int> <int> <int> <int>
## 1 Alabama 39 30 96 80
## 2 Alaska 41 25 90 94
## 3 Arizona 35 28 84 96
## 4 Arkansas 18 26 94 95
## 5 California 35 28 91 89
## 6 Colorado 37 28 79 95
## 7 Connecticut 46 36 87 82
## 8 Delaware 38 30 87 99
## 9 Florida 21 29 92 94
## 10 Georgia 19 25 95 93
## # … with 40 more rows
#Arrange
partydrivers %>% arrange(desc(losses))
## # A tibble: 50 x 10
## state pvi_party pvi_amount num_drivers perc_speeding perc_alcohol
## <chr> <fct> <dbl> <dbl> <int> <int>
## 1 Loui… R 17 20.5 35 33
## 2 Mary… D 23 12.5 34 32
## 3 Okla… R 34 19.9 32 29
## 4 Conn… D 11 10.8 46 36
## 5 Cali… D 24 12 35 28
## 6 New … D 13 11.2 16 28
## 7 Texas R 17 19.4 40 38
## 8 Miss… R 15 17.6 15 31
## 9 Tenn… R 28 19.5 21 29
## 10 Penn… R 1 18.2 50 31
## # … with 40 more rows, and 4 more variables: perc_not_distracted <int>,
## # perc_no_previous <int>, insurance_premiums <dbl>, losses <dbl>
#Mutate, Group_By
partydrivers %>% group_by(pvi_party) %>% mutate(mean2 = cummean(perc_alcohol)) %>% arrange(desc(mean2))
## # A tibble: 50 x 11
## # Groups: pvi_party [2]
## state pvi_party pvi_amount num_drivers perc_speeding perc_alcohol
## <chr> <fct> <dbl> <dbl> <int> <int>
## 1 Illi… D 13 12.8 36 34
## 2 Mass… D 29 8.2 23 35
## 3 Hawa… D 36 17.5 54 41
## 4 Maine D 5 15.1 38 30
## 5 Mary… D 23 12.5 34 32
## 6 Mich… D 1 14.1 24 28
## 7 Minn… D 2 9.6 23 29
## 8 New … D 13 11.2 16 28
## 9 New … D 7 18.4 19 27
## 10 Rhod… D 26 11.1 34 38
## # … with 40 more rows, and 5 more variables: perc_not_distracted <int>,
## # perc_no_previous <int>, insurance_premiums <dbl>, losses <dbl>, mean2 <dbl>
#Summary Statistics
partydrivers %>% summarize(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))
## # A tibble: 1 x 8
## pvi_amount num_drivers perc_speeding perc_alcohol perc_not_distra…
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 16.9 16.0 31.7 30.8 85.6
## # … with 3 more variables: perc_no_previous <dbl>, insurance_premiums <dbl>,
## # losses <dbl>
partydrivers %>% summarize(across(where(is.numeric), ~ sd(.x, na.rm = TRUE)))
## # A tibble: 1 x 8
## pvi_amount num_drivers perc_speeding perc_alcohol perc_not_distra…
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 11.5 3.91 9.73 5.16 15.2
## # … with 3 more variables: perc_no_previous <dbl>, insurance_premiums <dbl>,
## # losses <dbl>
partydrivers %>% summarize(across(where(is.numeric), ~ var(.x, na.rm = TRUE)))
## # A tibble: 1 x 8
## pvi_amount num_drivers perc_speeding perc_alcohol perc_not_distra…
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 133. 15.3 94.6 26.6 230.
## # … with 3 more variables: perc_no_previous <dbl>, insurance_premiums <dbl>,
## # losses <dbl>
partydrivers %>% summarize(across(where(is.numeric), ~ quantile(.x, na.rm = TRUE)))
## # A tibble: 5 x 8
## pvi_amount num_drivers perc_speeding perc_alcohol perc_not_distra…
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 8.2 13 16 10
## 2 7 12.8 23 28 82.5
## 3 17 15.6 34 30 88
## 4 24 18.6 38 33 94.8
## 5 47 23.9 54 44 99
## # … with 3 more variables: perc_no_previous <dbl>, insurance_premiums <dbl>,
## # losses <dbl>
partydrivers %>% summarize(across(where(is.numeric), ~ min(.x, na.rm = TRUE)))
## # A tibble: 1 x 8
## pvi_amount num_drivers perc_speeding perc_alcohol perc_not_distra…
## <dbl> <dbl> <int> <int> <int>
## 1 0 8.2 13 16 10
## # … with 3 more variables: perc_no_previous <int>, insurance_premiums <dbl>,
## # losses <dbl>
partydrivers %>% summarize(across(where(is.numeric), ~ max(.x, na.rm = TRUE)))
## # A tibble: 1 x 8
## pvi_amount num_drivers perc_speeding perc_alcohol perc_not_distra…
## <dbl> <dbl> <int> <int> <int>
## 1 47 23.9 54 44 99
## # … with 3 more variables: perc_no_previous <int>, insurance_premiums <dbl>,
## # losses <dbl>
partydrivers %>% summarize(across(where(is.numeric), ~ median(.x, na.rm = TRUE)))
## # A tibble: 1 x 8
## pvi_amount num_drivers perc_speeding perc_alcohol perc_not_distra…
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 17 15.6 34 30 88
## # … with 3 more variables: perc_no_previous <dbl>, insurance_premiums <dbl>,
## # losses <dbl>
partydrivers %>% summarize(across(where(is.numeric), ~ n_distinct(.x, na.rm = TRUE)))
## # A tibble: 1 x 8
## pvi_amount num_drivers perc_speeding perc_alcohol perc_not_distra…
## <int> <int> <int> <int> <int>
## 1 29 44 29 19 25
## # … with 3 more variables: perc_no_previous <int>, insurance_premiums <int>,
## # losses <int>
partydriversonlynum <- partydrivers %>% select_if(is.numeric)
partydriversonlynum %>% cor
## pvi_amount num_drivers perc_speeding perc_alcohol
## pvi_amount 1.000000000 0.28797287 0.101069621 0.14192790
## num_drivers 0.287972873 1.00000000 -0.018663595 0.17578538
## perc_speeding 0.101069621 -0.01866360 1.000000000 0.29140608
## perc_alcohol 0.141927903 0.17578538 0.291406080 1.00000000
## perc_not_distracted 0.108741412 0.05932482 0.128472265 0.05780096
## perc_no_previous 0.004789969 0.06712173 0.006442366 -0.22911225
## insurance_premiums -0.067215556 -0.10465864 0.033770006 0.01517102
## losses -0.037991143 -0.03506761 -0.061579945 -0.08344099
## perc_not_distracted perc_no_previous insurance_premiums
## pvi_amount 0.10874141 0.004789969 -0.067215556
## num_drivers 0.05932482 0.067121733 -0.104658639
## perc_speeding 0.12847227 0.006442366 0.033770006
## perc_alcohol 0.05780096 -0.229112249 0.015171025
## perc_not_distracted 1.00000000 -0.234326992 -0.022855291
## perc_no_previous -0.23432699 1.000000000 0.004128919
## insurance_premiums -0.02285529 0.004128919 1.000000000
## losses -0.06018868 0.041835397 0.652502452
## losses
## pvi_amount -0.03799114
## num_drivers -0.03506761
## perc_speeding -0.06157994
## perc_alcohol -0.08344099
## perc_not_distracted -0.06018868
## perc_no_previous 0.04183540
## insurance_premiums 0.65250245
## losses 1.00000000
partydrivers %>% group_by(pvi_party) %>% summarize(mean = mean(num_drivers), sd = sd(num_drivers))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 3
## pvi_party mean sd
## <fct> <dbl> <dbl>
## 1 D 12.9 2.58
## 2 R 17.9 3.36
partydrivers %>% group_by(pvi_party) %>% summarise(median = median(perc_alcohol), n = n())
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 3
## pvi_party median n
## <fct> <int> <int>
## 1 D 30 19
## 2 R 30 31
#Summary statistics Visualizations & Tidying to rearrange wide/long
partydrivers2 <- partydrivers
names(partydrivers2)<-gsub("\\_","",names(partydrivers2))
partydrivers2 %>% summarize_if(is.numeric,.funs = list("mean"=mean,"median"=median, "sd"=sd, "max"=max, "min"=min, "var"=var, "ndistinct" = n_distinct), na.rm=T) %>%
pivot_longer(contains("_"))%>%
separate(name,into=c("Variable","Statistics"), sep="_", convert = T)%>%
pivot_wider(names_from = "Variable",values_from="value")%>% arrange(Statistics)
## # A tibble: 7 x 9
## Statistics pviamount numdrivers percspeeding percalcohol percnotdistract…
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 max 47 23.9 54 44 99
## 2 mean 16.9 16.0 31.7 30.8 85.6
## 3 median 17 15.6 34 30 88
## 4 min 0 8.2 13 16 10
## 5 ndistinct 29 44 29 19 25
## 6 sd 11.5 3.91 9.73 5.16 15.2
## 7 var 133. 15.3 94.6 26.6 230.
## # … with 3 more variables: percnoprevious <dbl>, insurancepremiums <dbl>,
## # losses <dbl>
Using the filter function, I was curious to see which Democratic leaning states had insurance premiums lower than the national average of $935. Looking at the states, such as California, Colorado, and Maine, there does not seem to be a certain area that the cheaper insurance premiums reside in. I used the select function to look at the percentage statistics in each state that the datasets had included to view information based on drivers. I used to arrange to look at the loss’s insurance companies had in descending order to see which states were affected the most. I grouped by the leaning party and created a new column to look at the average alcohol consumption. I took the summary statistics of the entire dataset from mean, median, quantile, var, and more. After, I created a correlation coefficient to see which variables were most strongly related, which were losses and insurance premiums. Finally, I grouped by pvi party to look at the mean and standard deviation of the number of drivers and the median and of the percent alcohol, respectively. It was interesting to see that both the Democratic and Republican states had a median of 30% of drivers impaired by alcohol.
When making my condensed table of summary statistics, I needed to tidy the data to make it less wide. There were repeating columns with the same variable, such as multiple columns with summary statistics like mean, and multiple columns with the same variable such as numdrivers used again. In order to condense this, I took used pivot_longer to lengthen the table with the same data being presented in a vertical fashion. Once calculating summary statistics, all columns included an underscore, which made it simple to pivot based on that. After I pivoted longer, I used separate to put the separated column names by the underscore into their one column called Variable. I placed all of the summary functions in a column called Statistics. I then used pivot_wider to place the original column data from partydrivers2 into each of their own columns with summary statistics included.
Visualizations
Correlation Heatmap
partydrivedf <- partydrivers %>% select_if(is.numeric) %>% cor()
partydrivedf
## pvi_amount num_drivers perc_speeding perc_alcohol
## pvi_amount 1.000000000 0.28797287 0.101069621 0.14192790
## num_drivers 0.287972873 1.00000000 -0.018663595 0.17578538
## perc_speeding 0.101069621 -0.01866360 1.000000000 0.29140608
## perc_alcohol 0.141927903 0.17578538 0.291406080 1.00000000
## perc_not_distracted 0.108741412 0.05932482 0.128472265 0.05780096
## perc_no_previous 0.004789969 0.06712173 0.006442366 -0.22911225
## insurance_premiums -0.067215556 -0.10465864 0.033770006 0.01517102
## losses -0.037991143 -0.03506761 -0.061579945 -0.08344099
## perc_not_distracted perc_no_previous insurance_premiums
## pvi_amount 0.10874141 0.004789969 -0.067215556
## num_drivers 0.05932482 0.067121733 -0.104658639
## perc_speeding 0.12847227 0.006442366 0.033770006
## perc_alcohol 0.05780096 -0.229112249 0.015171025
## perc_not_distracted 1.00000000 -0.234326992 -0.022855291
## perc_no_previous -0.23432699 1.000000000 0.004128919
## insurance_premiums -0.02285529 0.004128919 1.000000000
## losses -0.06018868 0.041835397 0.652502452
## losses
## pvi_amount -0.03799114
## num_drivers -0.03506761
## perc_speeding -0.06157994
## perc_alcohol -0.08344099
## perc_not_distracted -0.06018868
## perc_no_previous 0.04183540
## insurance_premiums 0.65250245
## losses 1.00000000
partydrivedf2 <- partydrivedf %>% as.data.frame()
tidyparty <- partydrivedf2 %>% rownames_to_column("var1") %>%
pivot_longer(-1, names_to="var2", values_to="correlation")
tidyparty %>% ggplot(aes(var1, var2, fill=correlation)) + geom_tile() + scale_fill_gradient2(low="purple", mid="white", high="red") + geom_text(aes(label=round(correlation,2)),color = "black", size = 2)+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+ coord_fixed()+ ggtitle("Correlation Heatmap")

Disregarding the correlation of 1 between the variables with themselves, the majority of the plot relays a white, peach or pink color between variables. The light pink, white, and peach can may be considered but is not the main focus of the heatmap as the correlations are close to none. Correlations are lacking between many of the variables and the datasets do not appear to be related, whether negative or positive. For example, the percent of previous drivers without accidents correlated with insurance premiums or pvi amount is 0, and we can assume do not affect each other at all. This could mean when average insurance premiums are decided for the state, the percent of drivers without previous accidents is not taken into account. Perc_no_previous also contains the strongest negative correlation seen on the heatmap, with percent alcohol and percent not distracted.. It would make sense that as there are more drivers who have not had accidents, it is less likely to have a higher percentage of alcohol impaired drivers and distracted drivers because the drivers tend to be more responsible. The strongest correlation seen in this heatmap is that between the losses by insurance companies and insurance premiums. The strong correlation between the two variables shows the proable dependency that the premiums have on the financial losses of the company. The only variable that connects the two datasets is the correlation between the number of drivers and pvi amount. This could possibly give some confirmation towards the initial question, if the political affiliation or score has impacting results on collisions in the state, or vice versa.
Plot 1
partydrivers %>% ggplot(aes(losses, insurance_premiums, color=pvi_party)) + xlab("Losses Incurred Per Insured Drivers Collisons ($)") + ylab("Car Insurance Premiums ($)") + ggtitle("Car Insurance Premiums vs Insurance Company Collision Losses Per Party ") + geom_point()+ theme_bw() +scale_x_continuous(n.breaks=15) + geom_smooth(method = "lm") + scale_color_manual(values = c("#0C0CDE", "#D51717"))
## `geom_smooth()` using formula 'y ~ x'

The graph shows a positive correlation on both of our trendlines. In both Democratic and Republican states, higher losses result in higher premiums, presumably to make up for the losses. Looking at the trendlines between the parties, it appears citizens in Democratic states tend to pay more in insurance premiums overall than those in Republican states. The outliers in blue states tend to be higher than the trend line and the outliers in red states tend to be below the trendline. Presumably you could be potentially paying higher rates in the blue states. The trendlines do not start at the same point, and the minimum cost tends to be lower in the red states than in the blue states. The confidence interval on the higher end of losses are lacking in points, and it must be taken with a grain of salt that this positive, linear correlation would continue.
Plot 2
partydriversinsur <- partydrivers %>%mutate(insurance_rate = case_when(insurance_premiums>1074 ~ "high",
insurance_premiums<=1074 & 744<=insurance_premiums ~ "med",
insurance_premiums<744 ~ "low"))
partydriversinsur
## # A tibble: 50 x 11
## state pvi_party pvi_amount num_drivers perc_speeding perc_alcohol
## <chr> <fct> <dbl> <dbl> <int> <int>
## 1 Alab… R 27 18.8 39 30
## 2 Alas… R 15 18.1 41 25
## 3 Ariz… R 9 18.6 35 28
## 4 Arka… R 24 22.4 18 26
## 5 Cali… D 24 12 35 28
## 6 Colo… D 1 13.6 37 28
## 7 Conn… D 11 10.8 46 36
## 8 Dela… D 14 16.2 38 30
## 9 Flor… R 5 17.9 21 29
## 10 Geor… R 12 15.6 19 25
## # … with 40 more rows, and 5 more variables: perc_not_distracted <int>,
## # perc_no_previous <int>, insurance_premiums <dbl>, losses <dbl>,
## # insurance_rate <chr>
partydriversinsur %>% ggplot(aes(x =pvi_party , y =perc_no_previous , fill=insurance_rate))+
geom_bar(stat="summary", fun=mean, position="dodge") + scale_fill_manual(values=c("blue", "dark green", "purple"),
name="National Insurance Rate",
labels=c("High Rate", "Medium Rate", "Low Rate")) + xlab("Political Party") + ylab("Percentage of Drivers with No Previous Accidents") + ggtitle("Insurance Rate vs Rate of Previous Accidents Per Party")

This barplot is answering the question, for the states that have a high, medium, or low insurance rate on average, what percentage of their drivers have no previous accidents? This is grouped by the states that lean Democrat and Republican. Based on the aggregate percentage of "previous accidents", one could claim that Republican states tend to have less accidents than those of Democratic states. For the states that have medium and low insurance rates on average, regardless of the political affiliation, tend to have simmilar accident histories. In Republican states, there is a greater number of citizens with fewer previous accidents that pay a higher insurance rate. In Democratic states, there is a lesser number of citizens with previous accidents that pay a higher insurance rate. There may be a counfounding variable in the disparity between the high rate paying Democratic and Republican states.
Clustering
PAM
clust_dat <-partydrivers %>% select(-state, -pvi_party) %>% scale %>% as.data.frame
pam_dat<-partydrivers%>%select(-state,-pvi_party)
sil_width<-vector()
for(i in 2:10){
pam_fit <- pam(pam_dat, k = i)
sil_width[i] <- pam_fit$silinfo$avg.width
}
ggplot()+geom_line(aes(x=1:10,y=sil_width))+scale_x_continuous(name="k",breaks=1:10)
## Warning: Removed 1 row(s) containing missing values (geom_path).

pam1 <- clust_dat %>% scale %>% pam(k=3)
pamclust <- clust_dat %>% mutate(cluster=as.factor(pam1$clustering))
pamclust %>% ggplot(aes(insurance_premiums, losses, num_drivers, color=cluster )) + geom_point()

library(plotly)
pamclust %>%plot_ly(x= ~insurance_premiums, y = ~losses, z = ~num_drivers, color= ~cluster,
type = "scatter3d", mode = "markers") %>%
layout(autosize = F, width = 900, height = 400)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
library(GGally)
ggpairs(pamclust, columns=1:8, aes(color=cluster))

pamclust %>% group_by(cluster) %>% summarize_if(is.numeric, mean, na.rm=T)
## # A tibble: 3 x 9
## cluster pvi_amount num_drivers perc_speeding perc_alcohol perc_not_distra…
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.232 0.0942 0.511 0.578 0.147
## 2 2 -0.496 0.0315 -0.635 -0.557 -0.295
## 3 3 0.401 -0.304 -0.0356 -0.363 0.214
## # … with 3 more variables: perc_no_previous <dbl>, insurance_premiums <dbl>,
## # losses <dbl>
partydrivers %>% slice(pam1$id.med)
## # A tibble: 3 x 10
## state pvi_party pvi_amount num_drivers perc_speeding perc_alcohol
## <chr> <fct> <dbl> <dbl> <int> <int>
## 1 Miss… R 19 16.1 43 34
## 2 Geor… R 12 15.6 19 25
## 3 Verm… D 24 13.6 30 30
## # … with 4 more variables: perc_not_distracted <int>, perc_no_previous <int>,
## # insurance_premiums <dbl>, losses <dbl>
pam1$silinfo$avg.width
## [1] 0.0630808
plot(pam1, which=2)

In order to do PAM, I created my dataset called clust_dat, extracting columns from my original dataset, which had the 8 numeric variables I wanted to analyze. It was important to scale my data in case the variables were measured on different scales. I used silhouette width to get my number of clusters! I computed the silhouette width then took the average. I viewed the result with ggplot and chose the highest point on the graph, which was 2 clusters. I took my data from clust_dat and used the pam function. I attributed this to a new variable called pam1. After calculating the number of clusters needed and visualizing them on ggplot, I created a new vector called pamclust by taking clust_dat, which has my numeric variables, then used mutate to add a new variable called cluster in my dataset. I created cluster with data from my dataset pam1, which has my clustering vector. After that, I put it into ggplot, coloring by cluster to visualize my final cluster solution! The medoids for cluster 1, 2, and 3 were 0.04, 0.02, and 0.22, respectively. Looking at the clusters, they were not tightly spread from each other and in fact overlapped over the entire scatterplot. They hardly represented clustered at all. I viewed it in plotly which selected 3 of my variables then Ggally which showed all of the variables. Ggally did not show any instances of separated clusters. After I used PAM, I grouped by the cluster and summarized to figure out the means for each variable. I used the slice function to look at the states that are most representative of their cluster, which was Missouri, Georgia, and Vermont. I ran my average silhouette width based on my first pam that I ran to look at how good the solution was! I got 0.063 as the representative number. I created a plot of pam1 to visualize and see the averages of different variables and my overall average silhouette width, which was 0.06. Based on the width, the structure has validity. Overall, the structure s completely unusable and uninformative and our data is not valid.